Goto

Collaborating Authors

 mixed model


A Sharp Universality Dichotomy for the Free Energy of Spherical Spin Glasses

Kim, Taegyun

arXiv.org Machine Learning

We study the free energy for pure and mixed spherical $p$-spin models with i.i.d.\ disorder. In the mixed case, each $p$-interaction layer is assumed either to have regularly varying tails with exponent $α_p$ or to satisfy a finite $2p$-th moment condition. For the pure spherical $p$-spin model with regularly varying disorder of tail index $α$, we introduce a tail-adapted normalization that interpolates between the classical Gaussian scaling and the extreme-value scale, and we prove a sharp universality dichotomy for the quenched free energy. In the subcritical regime $α<2p$, the thermodynamics is driven by finitely many extremal couplings and the free energy converges to a non-degenerate random limit described by the NIM (non-intersecting monomial) model, depending only on extreme-order statistics. At the critical exponent $α=2p$, we obtain a random one-dimensional TAP-type variational formula capturing the coexistence of an extremal spike and a universal Gaussian bulk on spherical slices. In the supercritical regime $α>2p$ (more generally, under a finite $2p$-th moment assumption), the free energy is universal and agrees with the deterministic Crisanti--Sommers/Parisi value of the corresponding Gaussian model, as established in [Sawhney-Sellke'24]. We then extend the subcritical and critical results to mixed spherical models in which each $p$-layer is either heavy-tailed with $α_p\le 2p$ or has finite $2p$-th moment. In particular, we derive a TAP-type variational representation for the mixed model, yielding a unified universality classification of the quenched free energy across tail exponents and mixtures.


Using latent representations to link disjoint longitudinal data for mixed-effects regression

Schächter, Clemens, Hackenberg, Maren, Pfaffenlehner, Michelle, Tambe-Ndonfack, Félix B., Schmidt, Thorsten, Pechmann, Astrid, Kirschner, Janbernd, Hasenauer, Jan, Binder, Harald

arXiv.org Machine Learning

Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to use all available data sources. This, however, is complicated when usage of measurement instruments change during the observation period, for example when instruments are adapted to specific age ranges. The resulting disjoint longitudinal data trajectories, complicate the application of traditional modeling approaches like mixed-effects regression. We tackle this by mapping observations of each instrument to a aligned low-dimensional temporal trajectory, enabling longitudinal modeling across instruments. Specifically, we employ a set of variational autoencoder architectures to embed item values into a shared latent space for each time point. Temporal disease dynamics and treatment switch effects are then captured through a mixed-effects regression model applied to latent representations. To enable statistical inference, we present a novel statistical testing approach that accounts for the joint parameter estimation of mixed-effects regression and variational autoencoders. The methodology is applied to quantify the impact of treatment switches for patients with spinal muscular atrophy. Here, our approach aligns motor performance items from different measurement instruments for mixed-effects regression and maps estimated effects back to the observed item level to quantify the treatment switch effect. Our approach allows for model selection as well as for assessing effects of treatment switching. The results highlight the potential of modeling in joint latent representations for addressing small data challenges.


Dynamic Intent Queries for Motion Transformer-based Trajectory Prediction

Demmler, Tobias, Hartung, Lennart, Tamke, Andreas, Dang, Thao, Hegai, Alexander, Haug, Karsten, Mikelsons, Lars

arXiv.org Artificial Intelligence

Personal use of this material is permitted. Abstract -- In autonomous driving, accurately predicting the movements of other traffic participants is crucial, as it significantly influences a vehicle's planning processes. Modern trajectory prediction models strive to interpret complex patterns and dependencies from agent and map data. The Motion Transformer (MTR) architecture and subsequent work define the most accurate methods in common benchmarks such as the Waymo Open Motion Benchmark. The MTR model employs pre-generated static intention points as initial goal points for trajectory prediction. However, the static nature of these points frequently leads to misalignment with map data in specific traffic scenarios, resulting in unfeasible or unrealistic goal points. This adaptation of the MTR model was trained and evaluated on the Waymo Open Motion Dataset. Our findings demonstrate that incorporating dynamic intention points has a significant positive impact on trajectory prediction accuracy, especially for predictions over long time horizons. Furthermore, we analyze the impact on ground truth trajectories which are not compliant with the map data or are illegal maneuvers. Trajectory prediction is crucial for modern autonomous driving systems. It forms a deeper understanding of how other traffic participants will move in the future, which is the basis for subsequent motion planning of the autonomous vehicle.


Adaptive Variational Inference in Probabilistic Graphical Models: Beyond Bethe, Tree-Reweighted, and Convex Free Energies

Leisenberger, Harald, Pernkopf, Franz

arXiv.org Machine Learning

Variational inference in probabilistic graphical models aims to approximate fundamental quantities such as marginal distributions and the partition function. Popular approaches are the Bethe approximation, tree-reweighted, and other types of convex free energies. These approximations are efficient but can fail if the model is complex and highly interactive. In this work, we analyze two classes of approximations that include the above methods as special cases: first, if the model parameters are changed; and second, if the entropy approximation is changed. We discuss benefits and drawbacks of either approach, and deduce from this analysis how a free energy approximation should ideally be constructed. Based on our observations, we propose approximations that automatically adapt to a given model and demonstrate their effectiveness for a range of difficult problems.


Reviews: Statistical-Computational Tradeoff in Single Index Models

Neural Information Processing Systems

The paper first introduces first-order and second-order Stein's identity and then defines two function sets, C1 and C2, characterized by the covariance between f and X T\beta *. Further, authors define a common function set C(psi), which includes all link functions such that the second-order Stein's identity does not vanish under transformation psi. Then, authors propose a mixed model in 2.6 using two link functions f1\in C1\cap C(\psi) and f2\in C2\cap C(\psi). This model is finally used to derive lower bound. This is reasonable since true beta with link function f1 is easy to estimate (using first-order Stein's identity), while true beta with f2 is indistinguishable. The minimax rate is established in Prop 3.1.


Conditional Diffusion Models Based Conditional Independence Testing

Yang, Yanfeng, Li, Shuai, Zhang, Yingjie, Sun, Zhuoran, Shu, Hai, Chen, Ziqi, Zhang, Renming

arXiv.org Machine Learning

Conditional independence (CI) testing is a fundamental task in modern statistics and machine learning. The conditional randomization test (CRT) was recently introduced to test whether two random variables, $X$ and $Y$, are conditionally independent given a potentially high-dimensional set of random variables, $Z$. The CRT operates exceptionally well under the assumption that the conditional distribution $X|Z$ is known. However, since this distribution is typically unknown in practice, accurately approximating it becomes crucial. In this paper, we propose using conditional diffusion models (CDMs) to learn the distribution of $X|Z$. Theoretically and empirically, it is shown that CDMs closely approximate the true conditional distribution. Furthermore, CDMs offer a more accurate approximation of $X|Z$ compared to GANs, potentially leading to a CRT that performs better than those based on GANs. To accommodate complex dependency structures, we utilize a computationally efficient classifier-based conditional mutual information (CMI) estimator as our test statistic. The proposed testing procedure performs effectively without requiring assumptions about specific distribution forms or feature dependencies, and is capable of handling mixed-type conditioning sets that include both continuous and discrete variables. Theoretical analysis shows that our proposed test achieves a valid control of the type I error. A series of experiments on synthetic data demonstrates that our new test effectively controls both type-I and type-II errors, even in high dimensional scenarios.


Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods

Tschalzev, Andrej, Nitschke, Paul, Kirchdorfer, Lukas, Lüdtke, Stefan, Bartelt, Christian, Stuckenschmidt, Heiner

arXiv.org Machine Learning

Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' have been proposed to improve generalization and interpretability for clustered data. However, existing methods only allow for approximate quantification of cluster effects and are limited to regression and binary targets with only one clustering feature. We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks. We empirically demonstrate that MC-GMENN outperforms existing mixed effects deep learning models in terms of generalization performance, time complexity, and quantification of inter-cluster variance. Additionally, MC-GMENN is applicable to a wide range of datasets, including multi-class classification tasks with multiple high-cardinality categorical features. For these datasets, we show that MC-GMENN outperforms conventional encoding and embedding methods, simultaneously offering a principled methodology for interpreting the effects of clustering patterns.


On Predictive planning and counterfactual learning in active inference

Paul, Aswin, Isomura, Takuya, Razi, Adeel

arXiv.org Artificial Intelligence

Defining and thereby separating the intelligent "agent" from its embodied "environment", which then provides feedback to the agent, is crucial to model intelligent behaviour. Popular approaches, like reinforcement learning (RL), heavily employ such models containing agent-environment loops, which boils down the problem to agent(s) trying to maximise reward in the given uncertain environment Sutton and Barto [2018]. Active inference has emerged in neuroscience as a biologically plausible framework Friston [2010], which adopts a different approach to modelling intelligent behaviour compared to other contemporary methods like RL. In the active inference framework, an agent accumulates and maximises the model evidence during its lifetime to perceive, learn, and make decisions Da Costa et al. [2020], Sajid et al. [2021], Millidge et al. [2020]. However, maximising the model evidence becomes challenging when the agent encounters a highly'entropic' observation (i.e. an unexpected observation) concerning the agent's generative (world) model Da Costa et al. [2020], Sajid et al. [2021], Millidge et al. [2020]. This seemingly intractable objective of maximising model evidence (or minimising the entropy of encountered observations) is achievable by minimising an upper bound on the entropy of observations, called variational free energy Da Costa et al. [2020], Sajid et al. [2021]. Given this general foundation, active inference Friston et al. [2017] offers excellent flexibility in defining the generative model structure for a given problem and has attracted much attention in various domainsKuchling et al. [2020], Deane et al. [2020]. In this work, we develop an efficient decision-making scheme based on active inference by combining'planning' and'learning from experience'.


Increasing Trust in Language Models through the Reuse of Verified Circuits

Quirke, Philip, Neo, Clement, Barez, Fazl

arXiv.org Artificial Intelligence

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.


Joint Learning of Network Topology and Opinion Dynamics Based on Bandit Algorithms

Xing, Yu, Sun, Xudong, Johansson, Karl H.

arXiv.org Artificial Intelligence

We study joint learning of network topology and a mixed opinion dynamics, in which agents may have different update rules. Such a model captures the diversity of real individual interactions. We propose a learning algorithm based on multi-armed bandit algorithms to address the problem. The goal of the algorithm is to find each agent's update rule from several candidate rules and to learn the underlying network. At each iteration, the algorithm assumes that each agent has one of the updated rules and then modifies network estimates to reduce validation error. Numerical experiments show that the proposed algorithm improves initial estimates of the network and update rules, decreases prediction error, and performs better than other methods such as sparse linear regression and Gaussian process regression.